Skip to content

Conversation

theroggy
Copy link
Member

In read_dataframe without arrow, the number of rows of the result was counted first, and then the full data was read.

Especially when using a filter, counting the rows can take significant time. If the filter limits the rows a lot counting the rows can even take the same time as the subsequent reading of all data.

This PR removes the rowcount before reading to improve performance.

@theroggy theroggy marked this pull request as ready for review September 13, 2025 15:49
@theroggy theroggy marked this pull request as draft September 13, 2025 15:50
@theroggy theroggy marked this pull request as ready for review September 13, 2025 20:30
@theroggy theroggy modified the milestones: 0.11.0, 0.12.0 Sep 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant